- Analytical
- Maximum likelihood
- Resampling
- Bayesian
Mean undulation rate for \(n = 8\) gliding snakes:
undulation_rate <- c(0.9, 1.2, 1.2, 1.3, 1.4, 1.4, 1.6, 2.0)
What is the mean undulation rate for this sample of flying snakes?
Arithmetic mean:
\[\hat{Y} = \frac{\sum_{i=1}^{n}Y_i}{n}\]
\[mean~undulation~rate = \frac{\sum_{i=1}^{n}undulation~rate_i}{n}\]
sum(undulation_rate) / length(undulation_rate)
## [1] 1.375
mean(undulation_rate)
## [1] 1.375
Use dnorm() to calculate the relative likelihood of an observed value \(Y_i\) drawn from a normal distribution given a mean (\(\mu\)) and standard deviation (\(\sigma\)).
\[f\left(Y_i; \mu, \sigma\right) = \frac{1}{\sqrt{2\pi\sigma^{2}}} e^{\frac{-\left(Y_i-\mu\right)^{2}}{2\sigma^{2}}}\]
dnorm(0, mean = 0, sd = 1)
## [1] 0.3989423
Hypothesizing that the population mean is 0 and the standard deviation is 1, what is the likelihood of the observed values?
For a set of observations (\(Y_i\)) and hypothesized parameters (\(\Theta\); i.e., mean and standard deviation) the model likelihood is the product of the observations’ individual likelihoods:
\[\mathcal{L}\left(\left\{ Y_{i}\right\} _{i=1}^{n};\Theta\right) = \prod_{i=1}^{n}\phi\left(Y_{i}; \Theta\right)\]
\[\log\left(\mathcal{L}\left(\left\{ Y_{i}\right\} _{i=1}^{n};\Theta\right)\right) = \sum_{i=1}^{n} \log\left(\phi\left(Y_{i};\Theta\right)\right)\]
Hypothesizing that the population mean is 0 and the standard deviation is 1, what is the likelihood of the observed values?
Likelihood for the first observation (undulation_rate[1]):
undulation_rate[1]
## [1] 0.9
dnorm(undulation_rate[1], mean = 0, sd = 1)
## [1] 0.2660852
This is only the likelihood for one observation. We need the likelihoods for all 8 undulation rates to get a model likelihood.
Vector of likelihoods for all values in undulation_rate given mu = 0 and sigma = 1:
(rel_liks <- dnorm(undulation_rate, mean = 0, sd = 1))
## [1] 0.26608525 0.19418605 0.19418605 0.17136859 0.14972747 0.14972747 0.11092083 ## [8] 0.05399097
Model likelihood is the product of those likelihoods:
(lik <- prod(rel_liks))
## [1] 2.308476e-07
log(lik)
## [1] -15.28151
Rather than logging the product, we can sum the log-likelihoods:
sum(log(rel_liks))
## [1] -15.28151
For a model in which the mean is 0 and standard deviation is 1, the model log-likelihood is -15.28.
Is there another combination of \(\mu\) and \(\sigma\) that gives a higher likelihood (= larger log-likelihood)?
Try \(\mu = 1\) and \(\sigma = 1\):
sum(log(dnorm(undulation_rate, mean = 1, sd = 1)))
## [1] -8.281508
This is an improvement over \(\mu = 0\) and \(\sigma = 1\).
Find the combination of \(\mu\) and \(\sigma\) that maximizes the log-likelihood of the model for the mean and standard deviation of undulation rates.
Ranges of possible values:
For combinations of \(\mu\) and \(\sigma\), calculate the model likelihood. Pick the largest log-likelihood as the parameter estimates.
Set up the grid:
n <- 100 # How fine is the grid? mu <- seq(0.1, 3, length = n) # Vector of mu sigma <- seq(0.1, 0.5, length = n) # Vector of sigma grid_approx <- crossing(mu, sigma)
grid_approx
## # A tibble: 10,000 x 2 ## mu sigma ## <dbl> <dbl> ## 1 0.1 0.1 ## 2 0.1 0.104 ## 3 0.1 0.108 ## 4 0.1 0.112 ## 5 0.1 0.116 ## 6 0.1 0.120 ## 7 0.1 0.124 ## 8 0.1 0.128 ## 9 0.1 0.132 ## 10 0.1 0.136 ## # … with 9,990 more rows
log_lik <- numeric(length = nrow(grid_approx))
for (ii in 1:nrow(grid_approx)) {
log_lik[ii] <-
sum(dnorm(undulation_rate,
mean = grid_approx$mu[ii],
sd = grid_approx$sigma[ii],
log = TRUE))
}
grid_approx$log_lik <- log_lik
grid_approxmu and sigma to log_likgrid_approx
## # A tibble: 10,000 x 3 ## mu sigma log_lik ## <dbl> <dbl> <dbl> ## 1 0.1 0.1 -676. ## 2 0.1 0.104 -624. ## 3 0.1 0.108 -578. ## 4 0.1 0.112 -536. ## 5 0.1 0.116 -499. ## 6 0.1 0.120 -466. ## 7 0.1 0.124 -436. ## 8 0.1 0.128 -408. ## 9 0.1 0.132 -384. ## 10 0.1 0.136 -361. ## # … with 9,990 more rows
On this grid, the maximum likelihood estimates of \(\mu\) and \(\sigma\) are:
grid_approx[which.max(grid_approx$log_lik), ]
## mu sigma log_lik ## 4451 1.388889 0.3020202 -1.810766
The analytical estimates are:
mean(undulation_rate)
## [1] 1.375
sd(undulation_rate)
## [1] 0.324037
Search for the most likely values of \(\mu\) and \(\sigma\) across all possible values.
Define a function that takes a vector of values to optimize x (\(\mu\) and \(\sigma\)) as well as a set of data Y and returns the log-likelihood:
log_lik <- function(x, Y){
liks <- dnorm(Y, mean = x[1], sd = x[2], log = TRUE)
return(sum(liks))
}
We can now simultaneously optimize \(\mu\) and \(\sigma\), maximizing the log-likelihood.
reltol says to stop when the improvement is \(<10^{-100}\).
optim(c(0.1, 0.1), # Start at 0.1, 0.1
log_lik,
Y = undulation_rate,
control = list(fnscale = -1,
reltol = 10^-100))
## $par ## [1] 1.3750000 0.3031089 ## ## $value ## [1] -1.802203 ## ## $counts ## function gradient ## 175 NA ## ## $convergence ## [1] 0 ## ## $message ## NULL
glm() fits generalized linear modules via optimization:
fm <- glm(undulation_rate ~ 1) # Estimate a mean only coef(fm)
## (Intercept) ## 1.375
logLik(fm)
## 'log Lik.' -1.802203 (df=2)
For a small enough tolerance, the maximum likelihood estimate equals the analytical estimate.
set.seed(36428)
reps <- 100000
resampled_mean <- numeric(length = reps)
for (i in 1:reps) {
resampled_mean[i] <- mean(sample(undulation_rate,
replace = TRUE))
}
mean(undulation_rate)
## [1] 1.375
mean(resampled_mean)
## [1] 1.375399
Given enough iterations, the resampled mean equals the analytical mean (and equals the maximum likelihood mean).
Maximum likelihood inference:
Bayesian inference:
Ranges of possible maximum likelihood values:
Drawbacks:
Can we do better? Yes, Bayesian priors.
Cauchy distribution (location = 0, scale = 3)
stan code:
model <- "
data{
int<lower=1> N;
real undulation_rate[N];
}
parameters{
real<lower=0> mu;
real<lower=0> sigma;
}
model{
mu ~ cauchy(0, 3);
sigma ~ exponential(1);
undulation_rate ~ normal(mu, sigma);
}
"
fm_priors <- stan(
model_code = model,
data = list(undulation_rate = undulation_rate,
N = length(undulation_rate)),
iter = 1e5,
chains = 4)
## Inference for Stan model: bd7dbec9aa08d046d627dd6f83a1db7d. ## 4 chains, each with iter=1e+05; warmup=50000; thin=1; ## post-warmup draws per chain=50000, total post-warmup draws=2e+05. ## ## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat ## mu 1.370 0.001 0.144 1.080 1.284 1.371 1.456 1.655 79446 1 ## sigma 0.385 0.000 0.127 0.221 0.298 0.358 0.440 0.705 71227 1 ## lp__ 3.046 0.005 1.150 -0.045 2.599 3.399 3.866 4.167 53932 1 ## ## Samples were drawn using NUTS(diag_e) at Mon Jan 4 14:41:41 2021. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1).
Lower mean than the analytical or ML estimate (1.375) because the prior places more probability on lower values.